Search CORE

9 research outputs found

Recommended from our members

Flexible and efficient computation in large data centres

Author: Gog Ionel Corneliu
Publication venue: University of Cambridge
Publication date: 07/02/2018
Field of study

Increasingly, online computer applications rely on large-scale data analyses to offer personalised and improved products. These large-scale analyses are performed on distributed data processing execution engines that run on thousands of networked machines housed within an individual data centre. These execution engines provide, to the programmer, the illusion of running data analysis workflows on a single machine, and offer programming interfaces that shield developers from the intricacies of implementing parallel, fault-tolerant computations. Many such execution engines exist, but they embed assumptions about the computations they execute, or only target certain types of computations. Understanding these assumptions involves substantial study and experimentation. Thus, developers find it difficult to determine which execution engine is best, and even if they did, they become “locked in” because engineering effort is required to port workflows. In this dissertation, I first argue that in order to execute data analysis computations efficiently, and to flexibly choose the best engines, the way we specify data analysis computations should be decoupled from the execution engines that run the computations. I propose an architecture for decoupling data processing, together with Musketeer, my proof-of-concept implementation of this architecture. In Musketeer, developers express data analysis computations using their preferred programming interface. These are translated into a common intermediate representation from which code is generated and executed on the most appropriate execution engine. I show that Musketeer can be used to write data analysis computations directly, and these can execute on many execution engines because Musketeer automatically generates code that is competitive with optimised hand-written implementations. The diverse execution engines cause different workflow types to coexist within a data centre, opening up both opportunities for sharing and potential pitfalls for co-location interference. However, in practice, workflows are either placed by high-quality schedulers that avoid co-location interference, but choose placements slowly, or schedulers that choose placements quickly, but with unpredictable workflow run time due to co-location interference. In this dissertation, I show that schedulers can choose high-quality placements with low latency. I develop several techniques to improve Firmament, a high-quality min-cost flow-based scheduler, to choose placements quickly in large data centres. Finally, I demonstrate that Firmament chooses placements at least as good as other sophisticated schedulers, but at the speeds associated with simple schedulers. These contributions enable more efficient and effective use of data centres for large-scale computation than current solutions.Google fellowship in distributed system

Apollo (Cambridge)

Firmament: Fast, Centralized Cluster Scheduling at Scale

Author: Gog Ionel
Schwarzkopf Malte
Gleave A
Watson Robert
Hand S
Publication venue: Symposium on Operating Systems Design and Implementation
Publication date: 04/11/2016
Field of study

Centralized datacenter schedulers can make high-quality placement decisions when scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of high latency at scale, which degrades response time for interactive tasks and reduces cluster utilization. This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at sub- second placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations. Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20 x over Quincy [22], a prior centralized scheduler using the same MCMF optimiza- tion. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used central- ized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6 x.This work was supported by a Google European Doc- toral Fellowship, by NSF award CNS-1413920, and by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249

Biblioteca Digital de la Comunidad de Madrid

Apollo (Cambridge)

Learning Scheduling Algorithms for Data Processing Clusters

Author: Abadi Martín
Addanki Ravichandra
Dai Hanjun
Finn Chelsea
Ghodsi Ali
Gog Ionel
Grandl Robert
Greensmith Evan
Hindman Benjamin
Kingma Diederik P
Mao Hongzi
Mao Hongzi
Marcus Ryan
Mirhoseini Azalia
Mirhoseini Azalia
Pinto Lerrel
Schulman John
Spark Apache
Sutton S.
Weaver Lex
Zaharia Matei
Publication venue
Publication date: 21/08/2019
Field of study

Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters

Author: Ananthanarayanan Ganesh
Ananthanarayanan Ganesh
Ananthanarayanan Ganesh
Gog Ionel
Jeffrey Dean
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/08/2018
Field of study

International audienceMany Big Data processing applications nowadays run on large-scale multi-tenant clusters. Due to hardware heterogeneity and resource contentions, straggler problem has become the norm rather than the exception in such clusters. To handle the straggler problem, speculative execution has emerged as one of the most widely used straggler mitigation techniques. Although a number of speculative execution mechanisms have been proposed, as we have observed from real-world traces, the questions of ``when'' and ``where'' to launch speculative copies have not been fully discussed and hence cause inefficiencies on the performance and energy of Big Data applications. In this paper, we propose a performance model and an energy consumption model to reveal the performance and energy variations with different speculative execution solutions. We further propose a window-based dynamic resource reservation and a heterogeneity-aware copy allocation technique to answer the ``when'' and ``where'' questions for speculative executions. Evaluations using real-world traces show that our proposed technique can improve the performance of Big Data applications by up to 30% and reduce the overall energy consumption by up to 34%

Crossref

INRIA a CCSD electronic archive server

Energy efficiency of large scale graph processing platforms

Author: Gog Ionel
Gonzalez Joseph E.
Leskovec Jure
Rini
Software Foundation The Apache
Software Foundation The Apache
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

A number of graph processing platforms have emerged recently as a result of the growing demand on graph data analytics with complex and large-scale graph structured datasets. These platforms have been tailored for iterative graph computations and can offer an order of magnitude performance gain over generic data-flow frameworks like Apache Hadoop and Spark. Nevertheless, the increasing availability of such platforms and their functionality overlap necessitates a comparative study on various aspects of the platforms, including applications, performance and energy efficiency. In this work, we focus on the energy efficiency aspect of some large scale graph processing platforms. Specifically, we select two representatives, e.g., Apache Giraph and Spark GraphX, for the comparative study. We compare and analyze the energy consumption of these two platforms with PageRank, Strongly Connected Component and Single Source Shortest Path algorithms over five different realistic graphs. Our experimental results demonstrate that GraphX outperforms Giraph in terms of energy consumption. Specifically, Giraph consumes 1.71 times more energy than GraphX on average for the mentioned algorithms

Crossref

VTT Research System

CERN Document Server

Charon: Specialized near-memory processing architecture for clearing dead objects in memory

Author: Akin Berkin
Amit Nadav
Bu Yingyi
Eckert Yasuko
Gog Ionel
Hadidi Ramyad
Hertz Matthew
Joao José A.
Kyrola Aapo
Li S.
Li Sheng
Ma Justin
Nai L.
Nguyen Khanh
Ogleari M.
Poremba Matthew
Pugsley Seth H.
Wise David S.
Witawas
Wright Greg
Yang Ting
Zaharia Matei
Zhang M.
Publication venue: MICRO
Publication date: 01/10/2019
Field of study

© 2019 Association for Computing Machinery.programming, saving a programmer from many nasty memoryrelated bugs. However, these productivity benefits come with a cost in terms of application throughput, worst-case latency, and energy consumption. Since the first introduction of GC by the Lisp programming language in the 1950s, a myriad of hardware and software techniques have been proposed to reduce this cost. While the idea of accelerating GC in hardware is appealing, its impact has been very limited due to narrow coverage, lack of flexibility, intrusive system changes, and significant hardware cost. Even with specialized hardware GC performance is eventually limited by memory bandwidth bottleneck. Fortunately, emerging 3D stacked DRAM technologies shed new light on this decades-old problem by enabling efficient near-memory processing with ample memory bandwidth. Thus, we propose Charon1, the first 3D stacked memory-based GC accelerator. Through a detailed performance analysis of HotSpot JVM, we derive a set of key algorithmic primitives based on their GC time coverage and implementation complexity in hardware. Then we devise a specialized processing unit to substantially improve their memory-level parallelism and throughput with a low hardware cost. Our evaluation of Charon with the full-production HotSpot JVM running two big data analytics frameworks, Spark and GraphChi, demonstrates a 3.29 geomean speedup and 60.7% energy savings for GC over the baseline 8-core out-of-order processor.N

Crossref

SNU Open Repository and Archive

Kubernetes cluster optimization using hybrid shared-state scheduling framework

Author: Boutin Eric
Burns B.
Chunikhin Oleg
Cérin C.
Docker
Etcd
Fluentd
Gog Ionel
Grillet Armand
Hindman Benjamin
Idowu Toye
Incubator Kubernetes
Kubernetes
Kubernetes
Kubernetes GitHub
Lewis Ian
Loiselle Sean
Ngnix
Org Mesos Apache
Poseidon
Project The Helm
Project Vault
Repository Kubernetes
Schwarzkopf Malte
Swarm
Tornow Dominik
Tron-Lozai César
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

This paper presents a novel approach for scheduling the workloads in a Kubernetes cluster, which are sometimes unequally distributed across the environment or deal with fluctuations in terms of resources utilization. Our proposal looks at a hybrid shared-state scheduling framework model that delegates most of the tasks to the distributed scheduling agents and has a scheduling correction function that mainly processes the unscheduled and unprioritized tasks. The scheduling decisions are made based on the entire cluster state which is synchronized and periodically updated by a master-state agent. By preserving the Kubernetes objects and concepts, we analyzed the proposed scheduler behavior under different scenarios, for instance we tested the failover/recovery behavior in a deployed Kubernetes cluster. Moreover, our findings show that in situations like collocation interference or priority preemption, other centralized scheduling frameworks integrated with the Kubernetes system might not perform accordingly due to high latency derived from the calculation of load spreading. In a wider context of the existing scheduling frameworks for container clusters, the distributed models lack of visibility at an upper-level scheduler might generate conflicting job placements. Therefore, our proposed scheduler encompasses the functionality of both centralized and distributed frameworks. By employing a synchronized cluster state, we ensure an optimal scheduling mechanism for the resources utilization.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Network Architectures and Service

Crossref

TU Delft Repository